Whole-genome duplication in an algal symbiont bolsters coral heat tolerance

The algal endosymbiont Durusdinium trenchii enhances the resilience of coral reefs under thermal stress. D. trenchii can live freely or in endosymbiosis, and the analysis of genetic markers suggests that this species has undergone whole-genome duplication (WGD). However, the evolutionary mechanisms that underpin the thermotolerance of this species are largely unknown. Here, we present genome assemblies for two D. trenchii isolates, confirm WGD in these taxa, and examine how selection has shaped the duplicated genome regions using gene expression data. We assess how the free-living versus endosymbiotic lifestyles have contributed to the retention and divergence of duplicated genes, and how these processes have enhanced the thermotolerance of D. trenchii. Our combined results suggest that lifestyle is the driver of post-WGD evolution in D. trenchii, with the free-living phase being the most important, followed by endosymbiosis. Adaptations to both lifestyles likely enabled D. trenchii to provide enhanced thermal stress protection to the host coral.


INTRODUCTION
Uncovering the foundations of biotic interactions, particularly symbiosis, remains a central goal for research, given that virtually no organism lives in isolation.Coral reefs are marine biodiversity hotspots that are founded upon symbioses involving dinoflagellate algae in the family Symbiodiniaceae (1).These symbionts are the "solar power plants" of reefs, providing photosynthetically fixed carbon and other metabolites to the coral host (2,3).Breakdown of the coral-dinoflagellate symbiosis (i.e., coral bleaching), often due to ocean warming, puts corals at risk of starvation, disease, and eventual death.Symbiodiniaceae microalgae are diverse, with at least 15 clades including 11 named genera (1,(4)(5)(6), encompassing a broad spectrum of symbiotic associations and host specificity.Most of these taxa are facultative symbionts (i.e., they can live freely or in symbiosis), although exclusively symbiotic or free-living species also exist (1).Genomes of Symbiodiniaceae are believed to reflect the diversification and specialization of these taxa to inhabit distinct ecological niches (7,8).The genomes of symbionts, due to spatial confinement, are predicted to undergo structural rearrangements, streamlining, and rapid genetic drift (e.g., pseudogenization) (7).These traits are present in symbiotic Symbiodiniaceae (8).
Whole-genome duplication (WGD) is an evolutionary mechanism that generates functional novelty and genomic innovation (9,10) and can occur due to errors in meiosis, i.e., via autopolyploidy.Following WGD, the evolutionary trajectory of duplicated sequence regions generally proceeds from large-scale purging, temporary retention and/or divergence, to fixation (11).WGD-derived duplicated genes [i.e., ohnologs (12,13)] that are retained can provide a selective advantage and enhance fitness through increased gene dosage, specialization in function, and/or the acquisition of novel functions (11).
WGD has been described in free-living unicellular eukaryotes such as yeast (14)(15)(16), ciliates (17,18), and diatoms (19,20), but not in symbiotic species.Evidence of WGD is absent in the Symbiodiniaceae, except for the genus Durusdinium, as observed in microsatellite sequence data (21)(22)(23).This genus includes the thermotolerant species Durusdinium trenchii (Fig. 1A), a facultative symbiont that confers heat tolerance to corals, thereby enhancing holobiont resilience under thermal stress (24,25).We hypothesize that WGD played a critical role in enhancing the capacity of this symbiont to confer heat tolerance to host species.Specifically, the facultative lifestyle (i.e., free-living or symbiotic) of D. trenchii favored fixation of WGD both during the free-living phase as an adaptation to fluctuating environmental conditions, and the symbiotic phase with an expanded gene inventory being further modified by the coral or other host species (26).Here, we generated de novo genome assemblies from two isolates of D. trenchii and analyzed their evolutionary trajectories.On the basis of gene expression profiles, we elucidate how the facultative lifestyle has contributed to the fate of ohnologs in these microalgae, and how natural selection acting on gene families has increased thermotolerance of corals hosting D. trenchii symbionts.These data provide strong evidence for the dual lifestyle hypothesis as a driver of post-WGD genome evolution.

Whole-genome duplication in a coral endosymbiont
We generated de novo genome assemblies from D. trenchii CCMP2556 (total length = 1.71Gb; N50 = 774.26kb) and D. trenchii SCF082 (total length = 1.64 Gb; N50 = 398.48kb) using 10x Genomics linked reads (tables S1 and S2).The two genomes are highly similar in terms of whole-genome sequence (~99.7%shared identity, comparable to genomes of multiple isolates of single Symbiodiniaceae species; table S3) (8,27,28), size (table S4), and repeat landscapes (Fig. 1B and fig.S1), yielding ~54,000 protein-coding genes (table S5) with a comparable level of data completeness to other genome assemblies of Symbiodiniaceae (table S6; see Materials and Methods).To assess WGD in D. trenchii, we followed González-Pech et al. (8) to identify collinear gene blocks within each genome (Fig. 1C); these blocks likely arose via segmental duplication and/or WGD.We identified 864 blocks implicating 27,597 (49.46% of the total 55,799) genes in CCMP2556, and 776 blocks implicating 18,209 (34.02% of the total 53,519) genes in SCF082 (tables S7 and S8).The proportion of genes present in collinear blocks in D. trenchii is ~49fold greater (Fig. 1D) than that in other Symbiodiniaceae and the outgroup dinoflagellate Polarella; these taxa have not experienced WGD, as also observed in an earlier study incorporating genomes of the free-living Symbiodiniaceae lineage, Effrenium voratum (28).We also observed a high extent of conserved synteny in D. trenchii (22, S9).Using homologous protein sets derived from available whole-genome data, our inference of lineage-specific duplicated genes revealed 7945 gene duplication events specific to D. trenchii, which is greater than those in any other Symbiodiniaceae lineages on the tree (Fig. 1F).
Examination of the overall distribution of DNA synonymous substitutions (K s ) showed a distinct peak (fig.S3), as expected following WGD; the small peak values are explained by the recency of this event in D. trenchii (29).The WGD likely occurred after the split of D. trenchii from its sister Durusdinium glynnii 0.11 million to 1.93 million years ago, based on large subunit ribosomal RNA genetic divergence estimates (1).Our analysis of whole-genome data following Ladner et al. (30) aligns with these estimates of a Pleistocene origin in the Indo-Pacific (see Supplementary Text), a period of frequent sea-level changes in this region (31).These results, based on independently assembled genomes from two isolates, combined with the extent and size of the gene blocks (table S7 and fig.S2), provide unambiguous evidence for WGD in D. trenchii.

Asymmetric divergence of ohnolog-pair expression
To assess putative ohnolog functions in D. trenchii, we analyzed transcriptome data of CCMP2556 (32) that were generated from free-living cells in culture and from cells in endosymbiosis with the anemone Exaiptasia pallida, both under ambient (28°C) and thermal stress (34°C) conditions.These data (32), although generated from an experiment using the anemone host of Exaiptasia as the model for cnidarian symbiosis with Symbiodiniaceae (33), provide insights into how D. trenchii as a successful coral symbiont responds to heat stress during free-living stage and symbiotic stage.We focused on 6147 expressed ohnolog pairs that were supported by 10 or more mapped transcripts in ≥50% of the samples and inferred gene coexpression networks (fig.S4 and table S10) using weighted gene coexpression network analysis (WGCNA) (34).Most [4412 (71.7%) of 6147] ohnolog pairs were recovered in different networks, indicating the prevalence of expression divergence between duplicates post-WGD.We then classified ohnolog pairs into five groups based on their differential expression (DE) patterns (Fig. 2A and figs.S5 to S9; see Materials and Methods).Each group exhibited different characteristics (table S11) relative to expression (Fig. 2, A and B), sequence similarity (fig.S10), gene structure (i.e., exon gain/loss; Fig. 2C), and/or alternative splicing (figs.S11 and S12); see Supplementary Text.Ohnolog pairs that were differentially expressed between lifestyles observed in only one copy [group 2; 2244 (36.5%);Fig. 2D], and those with opposing DE observed in any one comparison [group 5; 100 (1.6%); Fig. 2E and table S12] exhibited strongly contrasting expression profiles (most Pearson's correlation coefficients < 0; Fig. 2B).They showed significantly elevated levels of positive selection (K a /K s = 3.31; table S11), exon gain/loss, sequence divergence, and differential exon usage (DEU; table S13) relative to the other three groups (pairwise Wilcox rank sum tests, P < 0.05; see Supplementary Text); these differences were not attributed to the differing number of splice junctions per gene, and ohnologs show greater extent of alternative splicing than singletons (fig.S13 and table S14).
Divergence of expression between ohnologs within a pair can have different outcomes, including the change in expression specificity, an important mechanism for adaptation after WGD.We assessed expression specificity using the τ index (35) that ranges between 0 (i.e., broad expression, low specificity) to 1 (i.e., narrow expression, high specificity) for all genes that passed the WGCNA quality filtering.We identified 3508 genes of high expression specificity (τ > 0.7), of which 1893 (53.96%) were ohnologs (table S15).Compared to singletons and other duplicate types (except for proximal duplicates), the ohnologs exhibited significantly elevated τ (fig.S14, Kruskal-Wallis test; P < 10 −5 ), indicating narrow expression profiles that are more specialized to distinct conditions.This divergence in expression was observed in group 2 pairs (fig.S6), for which the differentially expressed copy in each pair showed higher τ and variance in expression relative to its counterpart (Kruskal-Wallis test; P < 10 −15 ).Whereas most instances of specialized expression are associated with the free-living lifestyle (table S16), this specialization reflects a response to temperatures among the dispersed duplicates (i.e., duplicates separated by >20 genes; chi-square test post hoc: P = 0, residuals = 6.15 at 34°C) and the ohnologs (P < 0.05, residuals = 3.10 at 28°C).Only ohnologs displayed a tendency toward expression specificity (τ approximates 1) in the symbiotic lifestyle at both 28°C (chi-square test post hoc: P < 0.01, residuals = 3.61) and 34°C (P < 0.01, residuals = 3.89; table S16).Although the post-WGD specialization described here relates to lifestyle, parallels are known in multicellular organisms whereby the partitioning of expression across spatiotemporal scales is often observed in different tissues, organs, or developmental stages (11,36).In post-WGD yeasts, this trend may represent the uncoupling of noise and plasticity in gene expression that enables dynamic geneexpression responses in one of the two duplicates (37).In D. trenchii, this trait may provide greater flexibility in gene expression when cells are free-living or experiencing temperature stress.
The up-regulation or specialization of gene expression by some ohnolog pairs to different lifestyles in D. trenchii appears to either be mediated by or coincide with alterations in exon organization.Evolution of WGD-derived genes can lead to loss and/or diversification of alternative spliced forms, and the partitioning of ancestral splice forms between gene duplicates.We investigated this issue by examining the interplay of sequence conservation in exonic sequences with patterns of splice junction conservation (table S17) and DEU within ohnolog pairs.On the basis of the mean percentage of shared exons per pair, the ohnolog pairs in group 5 (13.35%) had lower exonic conservation compared to other groups (>16%).DEU across all groups was biased toward functions associated with the free-living lifestyle.Nearly all ohnolog pairs in group 5 (95 of 100; Fig. 2E and fig.S9) displayed contrasting DE between free-living and symbiotic phases (e.g., a gene copy was up-regulated during the free-living phase, whereas the other was up-regulated in the symbiotic phase) at one or both temperatures.This result underscores lifestyle as a strong driver of expression divergence (see Supplementary Text).Group 5 ohnologs that are specialized for the symbiotic lifestyle exhibited lower overall DEU and had fewer exons than their counterparts that were up-regulated under the free-living lifestyle (Wilcoxon rank sum test, P = 0.015, V = 2435.5).These ohnologs also contained exons that were more dominantly expressed during the symbiotic lifestyle (Wilcoxon rank sum test, P = 0.02798, V = 2540; Fig. 3A).Such a bias in DEU composition toward symbiosis-specialized exons was not observed in the other groups, e.g., group 2 (Fig. 3, B and C).
Consequently, the symbiosis-specific DEU, together with the overall decrease in per-gene exons and DEU among symbiosis-associated ohnologs in group 5, suggests a symbiosis-specific streamlining of gene function.Together with our observation of RNA editing (Supplementary Text, fig.S15, and tables S18 and S19), these results collectively indicate that alterations to gene structure and alternative splicing drive expression divergence of ohnologs in D. trenchii that are explained by algal lifestyle.
Although genomic streamlining is usually associated with obligate endosymbionts rather than facultative symbionts, gene duplication may facilitate streamlining in one of the two duplicates in favor of a symbiotic lifestyle.Ohnolog pairs of group 5 were significantly enriched for key functions (table S20), such as the processing of glutamine and production of the key antioxidant of glutathione, which has been linked to nitrogen cycling associated with symbiosis (38,39); the implicated genes include glutamine synthetase and S-formylglutathione hydrolase (table S12).These results suggest that following WGD, specialization of gene expression to distinct conditions may also be enabled by the streamlining of functions and specialization to symbiosis.In contrast, for their duplicated counterparts, the evolution of greater functional flexibility may reflect selection during the free-living phase.

Partitioned functionality in central metabolic pathways
WGD enables the retention of complete expression networks.Of the 19 inferred coexpression networks (table S10), different gene duplication types displayed preferential distributions to WGCNA modules (P < 2.2 × 10 −16 , χ 2 = 525.63).Singletons and ohnologs were biased toward contrasting coexpression networks, with singletons predominantly associated with networks linked to the symbiotic lifestyle (M1, M8, and M17 in table S21), and ohnologs with networks linked to a free-living lifestyle (M2, M5, and M6).This result suggests that genes preferentially retained as ohnologs were expressed at contrasting times, compared to those that were lost such that the remaining copies become singletons.DE of ohnologs was observed at the greatest magnitude between lifestyles during heat stress at 34°C (chi-square test post hoc: P < 10 −3 , residuals = 4.03; table S22); this may explain, in part, how D. trenchii can establish itself or increase in abundance in new hosts both during and after heat waves (40)(41)(42)(43)(44).These contrasting patterns of singleton and ohnolog membership across coexpression networks indicate a strong association of ohnolog retention with expression networks that are tightly linked to the free-living lifestyle.
We investigated the retention of complete metabolic pathways in both D. trenchii isolates.Of the 98 pathways retained in duplicate (table S23), specialization driven by lifestyle was detected in central metabolic pathways (figs.S16 to S23) (45), such as glycolysis/gluconeogenesis (Fig. 3D and fig.S16).Ohnolog specialization in glycolysis/gluconeogenesis reflects the contrasting functions of this pathway during the symbiotic versus free-living phases.That is, a high rate of gluconeogenesis, inferred using ohnolog expression data, supplies glucose for translocation to the coral host during symbiosis, whereas a high rate of glycolysis fuels the energetic needs of free-living cells that tolerate more variable environments (7).Although most enzymes were encoded by group 2 ohnologs (for which one gene copy was differentially expressed between lifestyles; Fig. 3E), a key ratelimiting enzyme of gluconeogenesis and the Calvin cycle, fructose 1,6-bisphosphatase, was differentially expressed in response to heat stress in symbiosis.The development of minor or partitioned functionality following WGD has been described in duplicate glycolysis pathways (46).In yeast, these pathways diverged and became semiindependent, with each specialized for low and high glucose levels (46).In D. trenchii, this might allow fine-tuning of carbon metabolism to the contrasting energetic needs of a dual lifestyle.

DISCUSSION
Our results provide strong evidence that the dual lifestyle has been a key driver of post-WGD genome evolution in the dinoflagellate D. trenchii.Our working hypothesis is illustrated in Fig. 4.
Under the null hypothesis of a solely free-living lifestyle, we expect post-WGD adaptations to primarily be driven by fluctuating environmental conditions (e.g., nutrient availability).Under the hypothesis of a dual lifestyle that includes symbiosis, adaptations will also strengthen the maintenance of a stable host-symbiont relationship and efficient nutrient/metabolite exchange within the coral holobiont.Although our results provide stronger support for the free-living phase as the primary driving force behind post-WGD evolution, both lifestyles impact the maintenance and expression divergence of ohnologs.These combined selective forces increase the overall fitness in D. trenchii, with the greater expression divergence of ohnologs under elevated temperatures a contributor to the high thermotolerance of this species when it is in symbiosis with corals (47).Benefits conferred by WGD to a free-living lifestyle in more variable environments, as well as tailoring of post-WGD duplicates to different lifestyles, primed D. trenchii to persist longer in the coral holobiont when faced with thermal stress.Whether symbiosis may also have negative effects on fitness post-WGD is unknown (48).It should be noted that the dual lifestyle is widespread in Symbiodiniaceae ( 1), but WGD is not.Although other facultative symbionts within Symbiodiniaceae (e.g., Cladocopium thermophilum and Durusdinium glynnii) are also known for their thermotolerance (49-51), WGD was not implicated in these lineages (8,27).Therefore, the key feature of D. trenchii that we are addressing is not dual lifestyle alone, but rather how the capacity for dynamically switching between the symbiotic versus free-living phase affects post-WGD genome evolution and adaptation.Because Symbiodiniaceae propagate to very high densities in coral tissues (10 5 to 10 6 cells/cm 2 ) (52, 53), the symbiotic phase of D. trenchii allows a rapid increase in the population size, particularly of fast-growing genotypes, while resident in host tissues.Consequently, genotypes that have faster growth rates or greater resilience to heat due to WGDderived adaptations can re-seed free-living populations upon dissociation from the coral due to colony death, bleaching, or other mechanisms of symbiont population control.Repeated cycles of symbiosis followed by the free-living phase may therefore increase the overall fitness of D. trenchii populations under the dual lifestyle (26).Retention of multiple gene copies combined with fixed, adaptive changes likely makes D. trenchii more capable of metabolic maintenance under dynamic, often stressful environments, and hence a more resilient symbiont.Such factors may, in turn, explain the large geographic and expanded host range of D. trenchii (23) and its well-known capacity for increasing coral survival under heat waves.Therefore, in an intriguing and unexpected twist, WGD, primarily driven by selection under a free-living life phase has converted D. trenchii into a coral symbiont able to protect the host coral from thermal stress during symbiosis.D. trenchii is also a valuable model for studying the genome-wide impacts of facultative lifestyles.

MATERIALS AND METHODS
De novo genome assembly and prediction of protein-coding genes D. trenchii strains CCMP2556 and SCF082 (previously designated UTSD amur-D-MI) originally isolated from an Orbicella faveolate and Acropora muricata coral colonies, respectively, were each separately cultured and genomic DNA-extracted for genomic sequencing (see Supplementary Text).Chromium libraries were generated for 10x linked-read sequencing and yielded a total of 236.45 gigabase pairs (Gbp) for CCMP2556 and 212.03Gbp for SCF082.We assessed the ploidy of D. trenchii using k-mers and GenomeScope2 (54), which revealed a distinctive single peak in both isolates indicating a haploid genome as seen in other Symbiodiniaceae (fig.S24).
For each isolate, a preliminary draft genome was assembled de novo using 10x Genomics Supernova v2.1.1.For CCMP2556, the estimated genome coverage (~100×) exceeded the optimal range (38 to 56×) of the Supernova assembler; we subsampled the 1.6 billion reads to 600 million reads (~60× coverage).For SCF082, coverage estimates were observed to be affected due to the presence of contaminant DNA from microbial sources in the sequencing reads; the de novo assembly was generated using all 1.4 billion reads with the flag -accept_extreme_coverage.
The presence of putative contaminant scaffolds in the supernova assemblies was investigated using a comprehensive approach adapted from Iha et al. (55) informed by read coverage, G + C content, taxonomic designation, and de novo transcriptome mapping.Taxonannotated G + C-coverage plots (fig.S25) were generated using the BlobTools suite v1.1 (56) to identify scaffolds in each assembly that deviated by read coverage, taxonomic sequence similarity, and/or G + C content.Read coverage was assessed using BWA v0.7.17, based on mapping of quality-trimmed reads [Longranger v2.2.2 (57) ran at default setting] to the genome assembly.The taxonomic identity of scaffolds was assigned on the basis of BLASTN search (E ≤ 10 −20 ) against genome sequences from bacteria, archaea, viruses, and alveolates in the NCBI nucleotide database (released 10 May 2021).De novo transcriptome assemblies were mapped to the genome assemblies using minimap2 v2.18 (58) within which we have modified the codes to account for noncanonical splice sites of dinoflagellates.Scaffolds that were designated as non-dinoflagellate were removed from the assemblies if they lacked mapped transcripts from the corresponding de novo transcriptome assembly, or when <10% of mapped transcripts indicated evidence of introns in the genomes.We considered a scaffold as a putative contaminant if (i) its sequence coverage or G + C content is not within the 1.5 × interquartile range and (ii) it lacks any transcript support defined above.Upon removal of these putative contaminant sequences from the CCMP2556 assembly, the filtered assembly was incorporated in the database as the D. trenchii reference for assessing the assembled scaffolds of SCF082 using the same approach.
Publicly available RNA sequencing (RNA-seq) data from previous studies of CCMP2556 (32) and SCF082 (59) were used to further scaffold the assembled genome sequences (see Supplementary Text).RNA-seq reads for both isolates were first quality-trimmed using fastp (60) (mean Phred quality ≥30 across a 4-bp window; minimum read length of 50 bp).For each isolate, the filtered reads were assembled de novo using Trinity v2.11.0 (61) independently for each treatment.The transcriptome assemblies for CCMP2556 (791,219 total transcripts) and those for SCF082 (355,411 total transcripts) were mapped to the filtered genome assemblies using mini-map2 v2.18 (58) that was modified to recognize the noncanonical splice sites of dinoflagellates.The mapped transcripts were then used to scaffold the filtered genome assemblies with L_RNA_Scaffolder (62) at default parameters.
A second round of scaffolding was then performed with ARBitR (63), which incorporates the distance information from linked-read sequencing data when merging and scaffolding assemblies.Longranger BASIC quality-trimmed linked genome reads (outputs from the standard 10x Genomics data workflow) were mapped to the scaffolded genome assemblies for ARBitR scaffolding, yielding the final genome assemblies: CCMP2556 (assembly size = 1.70 Gb; N50 = 750 kb; 29,137 scaffolds) and SCF082 (assembly size = 1.64 Gb; N50 = 398.5 kb; 44,682 scaffolds) (table S2).The CCMP2556 assembly is the most contiguous reported in Symbiodiniaceae aside from the recent chromosome-level assemblies for Symbiodinium microadriaticum (64) and Breviolum minutum (65).
Genome and gene features of dinoflagellates are highly idiosyncratic and atypical of eukaryotes, in part due to noncanonical splice sites (66).Therefore, the prediction of protein-coding genes from dinoflagellate genomes requires a comprehensive workflow (https://github.com/TimothyStephens/Dinoflagellate_Annotation_Workflow/; accessed 20 January 2022) tailored for these features, guided by high-confidence evidence (67).Here, we adopted a customized workflow integrating the results from multiple methods, guided by available transcript and protein sequences, independently for CCMP2556 and SCF082; see Supplementary Text for details.

Analysis of whole-genome duplication
We first searched for evidence of collinear gene blocks using MCS-canX (68) in intraspecies mode (−b 1) to identify putative duplicate gene blocks within each genome (i.e., segmental duplication and/or WGD), and in interspecies mode (−b 2) to identify syntenic gene blocks between the two genomes.A collinear block is defined as at least five genes conserved in the same orientation and order as a result of segmental duplication and/or WGD events.For each comparison, all-versus-all BLASTP search results were restricted to the top five hits (query or subject coverage >50%; E ≤ 10 −5 ).Predicted genes from each genome were classified using duplicate_gene_classifier (within MCScanX) into singleton, dispersed duplicates (i.e., duplicates separated by >20 genes), proximal duplicates (i.e., duplicates separated by <20 genes), tandem duplicates, and WGD/segmental duplicates (i.e., ohnologs).
Second, we assessed the reconciliation between each gene tree and the species tree; the topological incongruence between the two trees indicates a history of gene duplication or loss (69).OrthoFinder v2.3.10 (70) was first used to infer homologous gene sets among Suessiales species using BLASTP (E ≤ 10 −5 ).Multiple sequence alignments were performed with MAFFT v7.487 (71) (-linsi), from which phylogenetic trees were inferred using FastTree v2.1.11 (72) at default parameters.Reconciliation of the gene tree and species tree within OrthoFinder was then used to identify lineage-specific duplication events; those specific to D. trenchii indicative of WGDderived duplicated genes (i.e., ohnologs).
Third, we assessed the impact of WGD on the rate of synonymous substitution (K s ) among all homologous gene sets, using CCMP2556 as the reference, following the wgd pipeline (73).Briefly, homologous protein clusters were inferred using a Markov clustering algorithm (74) from the previous all-versus-all BLASTP search (used for MCScanX) and aligned using MAFFT (71).A phylogenetic tree for each homologous protein cluster was inferred using FastTree2 (72) and used to estimate K s values for each cluster using codeml implemented in PAML (75).A Gaussian mixture model was applied to the K s distribution, using a four-component model that provided the best fit for the data according to the Akaike information criterion, yielding a final node-averaged histogram of the K s distribution.To estimate the timing of WGD, we first calculated the estimated substitution rate (r) per year in Symbiodiniaceae adapting the approach of Ladner et al. (30) to incorporate genome data and the updated divergence time estimates from LaJeunesse et al. (1).
We followed Aury et al. (17) to infer metabolic pathways that were preferentially retained in duplicate following WGD using PRIAM v2 (released January 2018).Briefly, we identified metabolic enzymes that had been uniquely retained as ohnologs or singletons.We then compared the proportion of enzymes uniquely retained as ohnologs to singletons, to the background proportion of the number of ohnologs and singletons annotated in the genome.
This tests whether the number of uniquely retained metabolic enzymes for a particular pathway exceeds the background levels that would be expected to occur by random.We additionally required (i) five or more distinct enzymatic proteins to be identified as uniquely retained in either duplicate or singleton and (ii) pathways to be significantly overrepresented in both isolates.The proportion of enzymes coded by genes that were uniquely retained as ohnologs or singletons, compared to their overall proportions in the genome, was used to determine which KEGG pathways (45) were preferentially retained in duplicate following WGD.

Evolution of ohnolog expression
Trimmed RNA-seq reads (above) were mapped to the corresponding genome using HISAT2 v2.2.1 (--concordant-only) with a Hierarchical Graph FM index informed by annotated exon and splice sites.Counts of uniquely mapped paired-end reads overlapping with coding sequence regions were then enumerated using featureCounts --p --countReadPairs --B -C) implemented in Subread v0.2.3 (76).The raw counts were filtered to remove lowly expressed genes using the filtrByExpr function in edgeR.Differential gene expression analysis was performed with edgeR using a generalized linear model.We considered genes to be differentially expressed when the false discovery rate < 0.01 and the absolute value of log 2 (fold change) > 1.We compared the difference between lifestyles at two temperatures, i.e., symbiosis versus free-living at 34°C (L-34) and symbiosis versus free-living at 28°C (L-28), and the response to temperature stress in the two lifestyles, i.e., 34°C versus 28°C at free-living (T-Fr) and 34°C versus 28°C in symbiosis (T-Sy).
A WGCNA was performed on all genes in R using the WGCNA package.The variance of normalized counts was calculated using the standard DESeq2 workflow followed by its varianceStabilizingTransformation.Because symbiosis is a strong driver of expression in Symbiodiniaceae, using the inferred soft-thresholding power for reducing noise and setting a required threshold for gene correlations would have yielded a mean connectivity of more than 4000 at the inferred power of 6.Therefore, a weighted, unidirectional coexpression network was inferred using a power of 18, the recommended value for signed networks with less than 20 samples.Coexpression modules were inferred using the function blockwiseModules that collectively infers signed networks (networkType = "signed, " TOMtype = "signed, " maxBlock-Size = 10,000, corType = "bicor, " maxPoutliers = 0.05, pearsonFallback = "individual, " deepSplit = 2, dcuth = 0.999, minModuleSize = 30, reassignThreshold = 0.1, cutHeight = 0.2).
Expression specificity of ohnologs was assessed using the tau (τ) index (35), where τ = 1 indicates highly specific expression, and τ = 0 indicates broad expression.The log-normalized fragments per kilobase million (FPKM) counts were used to calculate τ index scores for those genes with a log 2 (FPKM +1) > 1 in at least one condition following Yanai et al. (35).The τ indices for the different MCScanX duplication categories were compared using a Kruskal-Wallis rank sum test; pairwise comparisons using Wilcoxon rank sum test with continuity correction and Holm P value adjustment were performed to determine differences between the duplication categories.A chi-square test of all significant τ indices (τ ≥ 0.7) was conducted to assess potential biases in expression specificity for treatments among the duplication categories.

Analysis of posttranscriptional regulation
All-versus-all BLASTN search (query or subject coverage >50%; E ≤ 10 −20 ) was used to identify shared exonic sequences that have been retained since WGD.For inferring DEU within genes among the treatment conditions, gene models were first broken up into exon "counting bins" using the Python script dexseq_prepare_annotation.py from DEXSeq.The relative usage of each exon bin, i.e., the number of transcripts mapping to the bin or to the gene, was then calculated from the HISAT2 BAM file using dexseq_count.py.The DEXSeq R package was then used to infer DEU within genes using a generalized linear model, correcting for significance at the gene level using the Benjamini-Hochberg method (77).
To examine the conservation of splice junctions in ohnolog pairs, all de novo assembled transcripts were first aligned to the genome using a minimap2 v2.20 (58) with code modified to recognize alternative splice sites in dinoflagellates, from which splices sites were identified and annotated using Program to Assemble Spliced Alignments (78).Splice sites categorized as alternative acceptor, alternative donor, alternative exon, retained exon, and skipped exon were retained for subsequent analysis.Each identified splice event was assigned two unique identifiers to represent the upstream and downstream positions of the splice event, along with its gene identifier and genomic location.The upstream and downstream 300-bp regions for each splice event were then extracted using the bedtools v2.30 flank and getfasta functions.An allversus-all BLASTN search of the extracted splice junction sequences was used to identify sequence similarity (E ≤ 10 −5 ) between the sequences.Custom Python scripts were used to filter the BLASTN results to identify conserved splice junctions, in which both upstream and downstream regions for a splice event in two ohnologs were significantly similar (E ≤ 10 −5 ).The splice junction profile for each ohnolog pair was then converted to a binary representation, where the presence of a splice junction in an ohnolog was represented as 1 and the absence of a splice junction represented as 0 (i.e., conserved splice junctions represented as 1 in both ohnologs compared to 0 for those that were not conserved).A Kendall's rank correlation was then conducted in R to identify ohnolog pairs that exhibited a high level of conservation in splice junctions.An exact binomial test was also performed to identify ohnolog pairs that had diverged in terms of total splice junctions (P < 0.05).

Fig. 1 .
Fig. 1.WGD in a facultative coral endosymbiont.(A) Microscopic images of a free-living D. trenchii cell and an Exaiptasia pallida anemone hosting D. trenchii under fluorescence, with red indicating the presence of D. trenchii.(B) Repeat landscapes shown separately for the ccMP2556 and ScF082 genomes.(C) circle plot depicting the location of syntenic blocks containing collinear gene blocks (i.e., ohnologs) between the ccMP2556 and ScF082 genomes.Ribbons indicate syntenic gene blocks identified with McScanX that overlap with putative WGD-duplicated regions in both isolates (blue; n = 2427), one isolate only (red; n = 612), or neither isolate (black; n = 35).(D) the percentage of genes in duplicated collinear gene blocks relative to the number of duplicated collinear gene blocks identified within the genomes of Suessiales species.(E) number of genes and syntenic genes recovered for each gene duplication category for the two isolates.(F) Phylogenetic tree of order Suessiales showing the number of lineage-specific gene family duplications at each node.

Fig. 2 . 28 :Fig. 3 .
Fig. 2. Ohnolog expression post-WGD.(A) three-dimensional scatterplot of the five groups of ohnologs pairs based on their pattern of differential expression (De), i.e., pairs for which neither gene showed De (group 1; blue), only one showed De (group 2; orange), both ohnologs showed De at the same time in the same manner (group 3; green), both ohnologs showed De but at different times (group 4; purple), and both ohnologs showed De at the same time but in opposing directions (group 5; red).(B) Pearson's correlation coefficients showing the correlation of expression patterns between each ohnolog pair within each of the five groups.(C) exon gain/loss between each ohnologs pair within each of the five groups.heatmaps depicting the normalized gene expression (z-score) for (D) groups 2 and (E) 5, in which ohnologs were clustered on the basis of their shared expression pattern.the assignment of gene copies (ohnolog A versus ohnolog B) for each ohnolog pair is arbitrary, based solely on common expression patterns.

•Fig. 4 .
Fig. 4. Model of divergence post-WGD in a facultative endosymbiont.Putative selective constraints faced by free-living and symbiotic Symbiodiniaceae under the dual lifestyle are shown, with a focus on post-WGD ohnolog sequence divergence and differential gene expression.